16. Categorical Features & Feature Column API

Categorical Features & Feature Column API

ND320 AIHCND C01 L03 A13 Transform Preprocess Categorical Features With Feature Column API

TensorFlow Feature Column API for Categorical Features Key Points

For building categorical features with the TensorFlow Features Column API, we follow the following steps:

  1. Select the categorical feature columns.
  2. Create a vocabulary text file with all of the unique values for a given categorical feature and add a placeholder value for out of vocabulary(OOV) values in the first row.
    • Note: For columns with a small number of features, you probably don't need to do this step and can instead pass an array/list to the categorical_column_with_vocabulary_list function.
    • The creation of a separate vocabulary file method is particularly great for features with high cardinality. It can also allow you to use other tools like SQL to generate these vocab files with more massive datasets and decouple this process if you are already creating these for data profiling purposes.
  3. Create your new feature by passing the vocabulary feature to the final derived feature. This can be an embedding or one-hot encoded feature. In this example, we created a one-hot encoding feature with the indicator column function.

Example:

principal_diagnosis_vocab = tf.feature_column.categorical_column_with_vocabulary_file(
            key="PRINCIPAL_DIAGNOSIS_CODE", vocabulary_file = vocab_files_list[0], num_oov_buckets=1)

Additional Resources

Code

If you need a code on the https://github.com/udacity.

Categorical Feature Columns

Which of the following is/are true about working with categorical features and the TensorFlow Dataset API?

SOLUTION:
  • `categorical_example_df['PRINCIPAL_DIAGNOSIS_CODE'].nunique()` = `3929` Use a vocab file!
  • When you build your TF dataset, you should pass a predictor field which is also know as the label.